Goto

Collaborating Authors

 stereotypical query



RUST: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models

Neural Information Processing Systems

To perform systematic evaluations, we set up 32 various tasks, including improvements to existing multimodal tasks, extension of text-only tasks to multimodal scenarios, and novel methods for risk assessment, which focus on models' basic performance with practical significance.